Very long instruction word processor

ABSTRACT

The invention relates to a very long instruction word (VLIW) processor comprising a plurality of functional units ( 110, 130, 135 ), each for executing an operation, and a VLIW controller ( 100 ) connected to each of said functional units ( 110, 130, 135 ) and adapted to controlling said functional units ( 110, 130, 135 ). The VLIW processor comprises at least one indication means ( 140 ) associated with one of said functional units ( 135 ) and adapted to registering and indicating to the VLIW controller ( 100 ) whether said one functional unit ( 135 ) is idle or operating.

The present invention relates to a very long instruction word (VLIW)processor according to the preamble of appended claim 1.

VLIW processors may be used in a variety of applications ranging fromsuper computers to work stations and personal computers. They may beused as dedicated or programmable processors in work stations, personalcomputers and video or audio consumer products. They may be applicationspecific processors, i.e. they may be designed to process specificapplications in order to enhance the performance of these applications.To this end special functional units are incorporated in the VLIWprocessor. Each functional unit is designed to process a particularoperation depending on the application to be processed. A VLIWcontroller is connected to each of these functional units in order tocontrol the operating sequence of the functional units. The VLIWcontroller has to issue the operations performed by the functionalunits. The set of instructions to be executed by the VLIW processorcontains the scheduled operations.

While a functional unit is performing an operation, further operationsmay not be scheduled on said functional unit if the functional unit isun-pipelined. A new operation can be scheduled by the compiler after afixed number of cycles corresponding to the initiation interval of thefunctional unit if it is pipelined. After a functional unit has finishedprocessing, the processing results must be further processed or outputfrom the VLIW processor. The compiler generating the set of instructionsneeds to know the initiation interval and latency of the functionalunits at compile time in order to schedule the operations of theseunits. The initiation interval of a functional unit is the time intervalafter which a new operation can be initiated on it. The latency of afunctional unit is the time it takes for the functional unit to performits operation. The operations mapped on the functional units sometimeshave latencies of the order of 10 to 1000 clock cycles. Further, thelatency of the functional unit may be variable. Conventionally,techniques for determining the latency of operations at compile time areused. However, input data dependent latencies cannot be calculated atcompile time. Previously, these operations were scheduled assuming aworst-case initiation interval and latency. The worst-case initiationinterval is the minimum time interval after which a new operation can beinitiated on the functional unit without altering the order in which theoutputs arrive. The worst case latency is the maximum time for thefunctional unit to perform its operation.

The use of worst-case latencies for scheduling the operations offunctional units in a VLIW processor has several drawbacks. Either alarge decision tree needs to be scheduled in parallel to fill up otherissue slots or the compiler has to introduce no-ops (no operationinstructions) in the schedule. Poor schedules result in a badperformance of the application processing and leads to larger powerconsumption.

It is an object of the present invention to improve the performance andpower consumption of VLIW processors.

This object is achieved by a VLIW processor and processing method asclaimed in claims 1 and 11, respectively.

Accordingly, an indication means is provided which is associated withone functional unit. Preferably the indication means is associated witha functional unit having a variable, data dependent latency. Theindication means is adapted to register whether the functional unit isidle or operating. This is indicated to the VLIW controller. Thereforethe latency need not be predicted at compile time in order to issue theoperations. During the operation the state of the functional unit isreported to the VLIW controller. If the functional unit has finished itsoperation, the VLIW controller may immediately issue further operationson the functional unit. Thereby no-ops may be avoided. The speed of theapplication is enhanced.

The VLIW processor according to the present invention may compriseseveral functional units having variable long latencies. Each of thefunctional units having variable long latencies may be associated withan indication means that reports the state of the functional unit to theVLIW controller as described in the previous paragraph.

If the VLIW processor must not perform further operations during theoperation on the functional unit, the remaining functional units of theprocessor may rest. Accordingly, power consumption may be reduced evenfurther. The VLIW may be brought into processor-stalling state for longlatency operations or only part of the processor may be stalleddepending on whether any useful operations can be issued in the otherissue slots.

Preferably the indication means is adapted to register whether said onefunctional unit receives data for executing the operation and whethersaid one functional unit outputs data after executing the operation.This is a very simple and effective way of determining whether thefunctional unit is operating or not. Whenever the functional unitreceives data to be processed, the functional unit changes from an idlestate to a busy state. The completion of the operation is evidenced bythe writing of the result of the operation into a destination register.Therefore the state of the functional unit may be determined bymonitoring the input and/or output of data.

The indication means may comprise an input register for inputting datato said one functional unit and an output register for receiving dataoutput from said one functional unit. The input and output registerseach comprise a presence bit indicative of the presence or absence ofdata in the respective register. Initially the input and outputregisters are set to an empty state. Whenever data is written into oneof the registers, the presence bit indicates the presence of data.Whenever data is output from one of the registers, the presence bitindicates the absence of data. The presence bit of a set of inputregisters indicate that the functional unit can begin an operation. Thesubsequent indication of data in the output register indicates thetermination of the operation. A single memory operation can read boththe data and synchronization information. A separate hardware deviceneed not be provided for determining synchronization information. Thepresence bit amounts to a hardware overhead of only one bit per word.

Preferably the input register is adapted to trigger the execution of theoperation by said one functional unit depending on the presence of datain the input register. The input register initiates the operation on theavailability of data. The VLIW controller is relieved of separatelytriggering the operation of the functional unit. The input of data tothe register ensures simultaneously that the functional unit receivesthe data to be processed and immediately starts to process the data whenavailable. In addition, if the functional unit can execute more than onefunction/instruction, a VLIW processor can issue a special command forsetting said function/instruction even before input data is available.This means that the input/output time shapes will depend on the command.

The indication means may comprise an input register file containing aplurality of said input registers and an output register file containinga plurality of said output registers. Each input register and eachoutput register contains a presence bit. A whole set of words can beprovided to the input register file. Thereby the VLIW controller doesnot have to provide for new data once the functional unit has processedthe data word contained in one input register. The functional unit mayeither execute an operation when all data arrives in the input registerfile or can start execution when there is sufficient number of inputs toproceed with a part of the computation. The triggering of the functionalunit may depend suitably on the number of input register presence bitsindicating the presence of data. The register files may be FIFOs(First-In-First-Out) or stacks or a combination of them, as disclosed byB. Mesman: Constraint Analysis for DSP Code Generation, PhD thesis,Eindhoven University of Technology, The Netherlands, May 2001. The orderin which data is provided to and from the input and output registerfiles may be defined by an access ordering method, as disclosed by C.Alba Pinto: Storage Constraint Satisfaction for embedded ProcessorCompilers, Ph.D thesis. Eindhoven University of Technology, TheNetherlands, June 2002. As a consequence the VLIW controller needslesser control bits to control the functional unit.

If the same input data is to be used several times by the functionalunit, a temporary register may be provided in the VLIW processor. Thetemporary register is connected to the functional unit, in order tostore data to be used repeatedly by said one functional unit. A normalregister file may also be used as a temporary register.

If the VLIW processor comprises a second functional unit which isadapted to execute an operation on the data output from said onefunctional unit, the indication means may be connected to the secondfunctional unit in order to indicate whether said one functional unit isoutputting data. Thereby the operation of the second functional unit maybe triggered by the indication means in the event, that the requireddata is output from the functional unit associated with the indicationmeans. The control of the second functional unit may be performed by theindication means. As a consequence the VLIW controller is relieved ofthe task of triggering the second functional unit.

An embodiment of the present invention will be described with referenceto the accompanied drawings.

FIG. 1 shows a VLIW processor according to an embodiment of the presentinvention.

FIG. 2 shows in more detail the indication means 140 associated with theapplication specific unit 135.

FIG. 3 shows the structure of both register files 160 and 170.

FIG. 1 depicts the VLIW processor according to the embodiment of thepresent invention. The VLIW processor comprises a VLIW controller 100that is connected to a number of functional units 110, 130 and 135. TheVLIW controller 100 issues in particular the operation of the functionalunits 110, 130 and 135. An interconnection network 120 connects thefunctional units 110, 130 and 135 directly in order to facilitate datatransfer between these functional units. A global register file 160stores values produced by the functional units 110, 130 and 135. Thepurpose of the global register files is to provide a way ofcommunicating data produced by one of the functional units 110, 130, 135to the other functional units 110, 130 and 135. Reference sign 110depicts standard VLIW functional units. The units 110 may encompassstandard arithmetic and logical units (ALUs), a constant generating unit(CONST), a memory unit (MEM) for data and an instruction memory (INSTRMEM). These units may be used in a large number of applications.

The functional units 130 and 135 are application specific units (ASUs).They are designed to perform specific operations geared to a particularapplication. An example for such an application is a hybrid encoder withembedded compression as described in Kleihorst R. P., and R. J. van derVleuten, DCT-domain embedded memory compression for hybrid video coders,Journal of VLSI signal processing systems, Vol. 24, page 31-41, 2000.Such an application calls for a number of ASUs, such as a discretecosine transform (DCT) for data transformation and inverse discretecosine transform (IDCT) for data inverse-transformation as well asencoder and decoder units (ENC and DEC) for performing bit-plane bybit-plane encoding and decoding of DCT coefficients. The ENC and DECunits can have processing times between 64 and 128 clock cyclesdepending on the input data. Reference sign 135 shows an ASU having avariable long latency behavior.

In order to schedule the operation of the ASU 135 an indicator means 140is provided. The indicator means 140 detects the state of the ASU 135.In case the ASU is executing an operation, the indicator means 140 sendsa signal to a hold control unit 150. Hereupon the unit 150 generates ahold signal which is transferred to the VLIW Controller 100. The VLIWcontroller 100 halts the rest of the VLIW processor as long as the holdsignal is received. This means that the ASU 135 performs its operation,while the rest of the VLIW processor remains unchanged when it attemptsto read an output produced by the ASU 135. The hold operation leads to areduction of the power consumption of the VLIW processor during thelatency of the ASU 135. Once the variable latency ASU 135 is ready withthe required output, the hold signal is reset by the indicator means140. Hereupon the rest of the processor is reactivated and consumes theoutput of the ASU 135. The processing speed is optimized since the VLIWprocessor continues processing the application in due time.

FIG. 2 shows in greater detail the structure of the indication means 140associated with the ASU 135 having a variable latency. The indicatormeans comprises two register files 160 and 170. Data to be processed isinput in ASU 135 via the input register file 160. The result ofprocessing the data is output to the output register file 170. Theindication means further comprises a detection unit 180 connected to theregister files 160 and 170. The detection unit 180 detects whether datais output from the register file to the ASU 135 and whether data isreceived from the ASU 135 in register file 170. As soon as the detectionunit 180 detects the input of data in the ASU 135, the detection unit180 generates a signal to the hold unit. The detection unit 180 stopssending the signal to the hold unit once it detects the output of datafrom the ASU 135.

FIG. 3 shows schematically the structure of both register files 160 and170 being identical. The register file contains a number of registers200. Each register contains a presence bit 210. All the registers areinitialized to the empty state. Whenever data is read into one registerthe corresponding presence bit 210 changes its state in order toindicate the presence of a data word. The output of data from a registerhas the effect that the register becomes empty and the presence bitchanges its state. The output of data from the input register to the ASUis triggered by the availability of input data. This means that theinput register file instructs the ASU to start computation when a singleor a predetermined number of presence bits indicate the presence ofinput data. Simultaneously the initialization of an operation isreported to the detection unit 180.

1. A VLIW processor comprising a plurality of functional units, each forexecuting an operation, and a VLIW controller connected to each of saidfunctional units and adapted to control said functional unitscharacterized by at least one indication means associated with one ofsaid functional units and adapted to register and indicate to the VLIWcontroller whether said one functional unit is idle or operating.
 2. TheVLIW processor of claim 1, wherein said indication means is adapted toregister whether said one functional unit receives data for executingits operation and whether said one functional unit outputs data afterexecuting its operation.
 3. The VLIW processor of claim 2, wherein saidindication means comprises an input register for inputting data to saidone functional unit and an output register for receiving data outputfrom said one functional unit, said input and output register eachcomprising a presence bit indicative of the presence or absence of datain the respective register.
 4. The VLIW processor of claim 3, whereinsaid input register is adapted to trigger the execution of the operationby said one functional unit, if data is present in the input register.5. The VLIW processor of claim 3, wherein said indication meanscomprises an input register file having a plurality of said inputregisters and an output register file having a plurality of said outputregisters.
 6. The VLIW processor of claim 5, wherein the input registerfile is adapted to trigger the execution of the operation by said onefunctional unit, if a predetermined number of the input registerscontain data.
 7. The VLIW processor of claim 2, comprising a temporaryregister for storing data to be used repeatedly by said one functionalunit, said temporary register being connected to said one functionalunit.
 8. The VLIW processor of claim 5, wherein the output register fileis adapted to trigger the execution of the operation of a secondfunctional unit, if a predetermined number of output registers containdata.
 9. The VLIW processor of claim 1, wherein said one functional unithas a variable long latency.
 10. The VLIW processor of claim 1, whereinthe latency of the one functional unit depends on the data to beprocessed by said functional unit.
 11. Method of processing data in aVLIW processor, comprising the steps: registering whether a functionalunit is idle or operating; and indicating to said VLIW controllerwhether said functional unit is idle or operating.
 12. The method ofclaim 11, wherein said registering step comprises the steps registeringwhether said one functional unit receives data for executing itsoperation and whether said one functional unit outputs data afterexecuting its operation.
 13. The method of claim 12, comprising thesteps of indicating to the VLIW controller that the functional unitreceives data, and indicating to the VLIW controller that the functionalunit outputs data.